GES668: Property Assessments in Baltimore

Joshua Spokes

Working with Reveal.js

This slide includes a code block and presentation notes.

nc <- st_read(system.file("shape/nc.shp", package="sf"))
Reading layer `nc' from data source 
  `/Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library/sf/shape/nc.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 100 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
Geodetic CRS:  NAD27
ggplot() +
  geom_sf(data = nc, aes(fill = NAME)) +
  guides(fill = "none")

Working with Reveal.js

This slide includes a hidden code block and a custom footer. See the Quarto code cells reference for more information.

AREA PERIMETER NAME FIPS
0.114 1.442 Ashe 37009
0.061 1.231 Alleghany 37005
0.143 1.630 Surry 37171

Overview

Are certain types of properties getting a free ride on their taxes? Project Origin

Project goals

  1. Identify tranches of property suspected to follow this pattern (Vacant, Auto-Oriented, etc.)

  2. Match these patterns to recorded sales

  3. Aggregate sales data by block group

Vacants

Changes in Scope

  • Analysis was limited to the last five years

  • Analysis limited by data quality

  • Aggregation switched from census tracts to neighborhoods

Data sources

  • Maryland CAMA/Assessment data is all public domain

  • Baltimore DHCD data licensed under Creative Commons 3.0

  • Baltimore Vacants data available under DOI DOI

Baltimore Real Property

Baltimore’s real property dataset provides additional attributes not provided at the state level like neighborhood and whether it’s unimproved property

Baltimore Vacant Building Notices

All vacant building notices issued in Baltimore back to the 1970s

Maryland Property Assessment

This data shows the property attributes of every piece of real property in Maryland, which includes location, assessment values, and some building characteristics.

Maryland CAMA Data

Building Characteristics

Core

Baltimore Neighborhoods

Analysis Approach

  • Combination of exploratory analysis and visualization

  • Relying heavily on dplyr and sf libraries for analysis

  • rdeck and tmap used for visualization

Identifying sales of vacant property

sales_valid_date |>
  left_join(dates, by = join_by(property == BLOCKLOT)) |>
  mutate(
    vacant_at_sale = (date >= DateNotice) & ((is.na(date_terminate) | date <= date_terminate))
  ) |>
  group_by(BLOCKLOT, sale) |>
  summarise(across(everything(), first),
            vacant_at_sale = any(vacant_at_sale, na.rm = TRUE)
  ) |>
  ungroup() -> vacant_sale

I was scratching my head trying to figure this out and asked ChatGPT [@ChatGPT] for help on how to approach this, which resulted in this elegant method for determining where there was a sale of vacant property.

Categorizing other sales

left_join(sales, property_land_use, by = join_by(property == BLOCKLOT)) |>
  group_by(property, transfer_no) |>
  summarise(date = first(date),
            price = first(price),
            block = first(block.x),
            property = first(property),
            acct_id_full = first(acct_id_full.x),
            vacant_at_sale = first(vacant_at_sale),
            Land_Value = first(Land_Value),
            Improvement_Value = first(Improvement_Value),
            Total_Assessment = first(Total_Assessment),
            NEIGHBOR = first(NEIGHBOR),
            BL_DSCTYPE = first(BL_DSCTYPE),
            BL_DSCSTYL = first(BL_DSCSTYL),
            CM_DSCIUSE = first(CM_DSCIUSE),
            NO_IMPRV = first(NO_IMPRV),
            .groups = "keep") |>
  mutate(identifier = case_when(
    (NO_IMPRV == "Y" & is.na(BL_DSCTYPE))    ~ "unimproved",
    vacant_at_sale                           ~ "vacant",
    str_detect(BL_DSCTYPE, "AUTO|WAREHOUSE") ~ "unperforming",
    .default = "regular"),
    price_ratio = Total_Assessment / price) |>
  ungroup() -> all_sales

Aggregation by neighborhood

all_sales |>
  group_by(NEIGHBOR, identifier) |>
  summarise(med_price_ratio = median(price_ratio),
            mean_price_ratio = mean(price_ratio),
            med_price = median(price),
            mean_price = mean(price),
            n = n(),
            .groups = "keep") |>
  pivot_wider(id_cols = "NEIGHBOR",
              names_from = "identifier",
              names_glue = "{identifier}_{.value}",
              values_from = c("med_price_ratio",
                              "mean_price_ratio",
                              "med_price",
                              "mean_price",
                              "n")) %>%
  left_join(neighborhoods, ., by = join_by(Name == NEIGHBOR)) |>
  mutate(pct_blk = Blk_AfAm / Population,
         pct_wht = White / Population) |>
  select(Name,
         Population,
         pct_blk,
         pct_wht,
         starts_with(c("vacant",
                       "unimproved",
                       "unperforming",
                       "regular"))) -> neighborhood_stats

Challenges in working with data

  • Baltimore City and Maryland State data use separate keys (BLOCKLOT and Account ID)

  • I frequently found myself backtracking to add attributes to various intermediate datasets

  • Getting to the end and realizing that data isn’t clean enough for conclusions

Successes in working with data

I think my analysis does a good job in taking a very big dataset and distilling it to a more useable format. I aimed to make the data more tidy so that it would be easier to create comparisons

What do you think your project does well?

Your areas of success likely depend on your approach:

  • If you completed a data analysis, what are your key findings?

  • If you created a map, what does it communicate to people who see it?

Where to learn more

Add links or brief descriptions of how to find the required elements for your project repository.

project data
source files or a script used for importing and processing the data before visualization or analysis. Students who are using {osmdata} or {tidycensus} should include scripts for downloading data.
project code
any R scripts, RMarkdown, or Quarto files used to read, tidy, transform, analyze, visualize or map the selected data.
output files
including any processed data files or rendered PDF or HTML documents.
README
a public-facing summary of the project explaining your process for processing the data and any relevant information another person may need to work with the data or your code.

These can be placeholder links as you still have time to complete the final project and some elements may be incomplete.